Benchmark Visual Question Answer Models by using Focus Map

نویسندگان

  • Wenda Qiu
  • Yueyang Xianzang
  • Zhekai Zhang
چکیده

Inferring and Executing Programs for Visual Reasoning proposes a model for visual reasoning that consists of a program generator and an execution engine to avoid endto-end models. To show that the model actually learn which objects to focus on to answer the questions, the authors give a visualizations of the norm of the gradient of the sum of the predicted answer scores with respect to the final feature map. However, the authors does not evaluate the efficiency of focus map. This paper purposed a method for evaluating it. We generate several kinds of questions to test different keywords. We infer focus maps from the model by asking these questions and evaluate them by comparing with the segmentation graph. Furthermore, this method can be applied to any model if focus maps can be inferred from it. By evaluating focus map of different models on the CLEVR dataset, we will show that CLEVR-iep model has learned where to focus more than end-to-end models. The main contribution of this paper is that we propose a focus-map-based evaluating method for visual reasoning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Question Answering Using Enhanced Lexical Semantic Models

In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardl...

متن کامل

Segmentation Guided Attention Networks for Visual Question Answering

In this paper we propose to solve the problem of Visual Question Answering by using a novel segmentation guided attention based network which we call SegAttendNet. We use image segmentation maps, generated by a Fully Convolutional Deep Neural Network to refine our attention maps and use these refined attention maps to make the model focus on the relevant parts of the image to answer a question....

متن کامل

What is the fastest car in the world ?

In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardl...

متن کامل

Differential Attention for Visual Question Answering

In this paper we aim to answer questions based on images when provided with a dataset of question-answer pairs for a number of images during training. A number of methods have focused on solving this problem by using image based attention. This is done by focusing on a specific part of the image while answering the question. Humans also do so when solving this problem. However, the regions that...

متن کامل

Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

In recent years, visual question answering (VQA) has become topical. The premise of VQA’s significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps ‘understand’ less than initially hoped, and instead master the easier task of exploiting cues given away ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1801.05302  شماره 

صفحات  -

تاریخ انتشار 2018